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1.  Foreword 

The  project  has  been  a  great  success: 

•  A  major  breakthrough  in  neural  network  training  has  been  achieved.  The  novel  risk¬ 
averting  training  method  is  effective  in  avoiding  poor  local  minima  of  the  training 
criterion. 

•  Mathematical  justification  and  numerical  confirmation  of  neural  networks  with  long- 
and  short-term  memories  for  general  adaptive  processing  have  been  accomplished. 

•  Mathematical  justification,  intuitive  understanding  and  numerical  confirmation  of  risk¬ 
averting  neural  network  for  general  robust  processing  with  various  degrees  of 
robustness  have  been  obtained. 

•  Robust  neural  filters  have  been  mathematically  justified  and  numerically  tested. 

•  General  adaptive  filtering  and  general  robust  adaptive  filtering  turned  out  to  be 
much  more  difficult  than  expected.  Nevertheless,  schemes  for  them  by  neural 
computing,  which  are  mathematically  natural  and  convincing,  have  finally  been 
conceived. 

2.  Statement  of  the  problem  studied 

If  the  signal  or  measurement  process  involved  in  a  filtering  problem  has  an  uncertain 
environmental  parameter  that  is  either  observable  or  unobservable  but  constant  long  enough  for 
adaptation,  an  adaptive  filter  is  required.  If  the  environmental  parameter  is  unobservable  and 
changes  too  fast  for  adaptation,  then  the  filter  needs  to  be  robust  toward  the  environmental 
parameter  to  avoid  excessively  large  or  disastrous  filtering  errors.  The  main  objective  of  the 
project  is  to  develop  systematic  and  general  methods  of  designing  practical  robust  and/or  adaptive 
filters  that  have  virtually  optimal  online  performances.  Such  filters  are  much  needed  in  many 
applications  of  great  practical  importance. 

Novel  neural  networks  with  long-  and  short-term  memories  and  novel  robust  neural  networks 
were  proposed  as  building  blocks  of  the  adaptive  and/or  robust  filters  respectively.  Fundamental 
issues  on  training  these  neural  networks  and  their  effectiveness  to  approximate  dynamical  systems 
have  been  studied  in  detail  during  the  contract  period  and  breakthrough  results  have  been  obtained. 
Based  on  these  results,  schemes  of  robust  and/or  adaptive  filtering  have  been  conceived  and  under 
mathematical  analysis  and  numerical  testing. 

3.  Summary  of  the  most  important  results 


Results,  that  support  the  conclusions  stated  in  the  Foreword  above,  are  summarized  in  the 
following  for  each  publication  in  which  they  appear. 


•  Mathematical  Justification  of  Risk-Sensitive  Neural  Filtering,  Proceedings  of  the  2000 
Conference  on  Information  Sciences  and  Systems,  pp.  WA1-7-WA1-11,  March  2000, 
Princeton,  New  Jersey. 

For  general  signal  and  measurement  processes  under  mild  regularity  conditions,  the  optimal 
filtering  performance  with  respect  to  a  general  risk- sensitive  criterion  can  be  approximated  to 
any  accuracy  by  a  recurrent  neural  network  trained  with  a  risk-averting  criterion.  This  provides 
a  mathematical  justification  of  robust  neural  filtering. 

•  An  Adaptive  Method  of  Training  Multilayer  Perceptrons,  Proceedings  of  the  2001 
International  Joint  Conference  on  Neurcd  Networks,  pp.  2013-2018,  Washington,  D.C. 

This  paper  proposes  a  method  of  training  MLPs  that  selects  a  training  criterion  most  suitable 
for  the  function  to  be  approximated;  the  measurement  noises  and  the  sampling  distribution  in 
training  data,  and  that,  perhaps  most  important,  avoids  poor  local  minima. 

Numerical  examples  demonstrate  that  the  proposed  training  method  is  able  to  include  fine 
features  and  under-represented  segments  of  the  function  being  approximated  in  the  multilayer 
perceptron  under  training.  Most  important  perhaps,  the  method  has  good  ability  to  avoid 
getting  trapped  in  a  poor  local  minimum. 

•  Training  Multilayer  Perceptrons  in  the  Presence  of  Measurement  Outliers,  Proceedings  of 
the  2001  International  Joint  Conference  on  Neurcd  Networks,  pp.  2030-2035,  Washington, 
D.C. 

Outlying  measurement  noises  in  the  training  data  may  distort  an  MLP  trained  with  the  ordinary 
quadratic  criterion  on  the  data.  Much  research  has  been  done  to  reduce  such  effects  of  the 
outlying  measurement  noise.  Robust  estimation  criteria  such  as  logistic,  Huber's,  Tukey's 
biweight  and  Talwar's  criteria  from  statistics  have  been  used  for  training  neural  networks. 
However,  there  are  serious  difficulties  in  the  selection  of  suitable  initial  MLP  weights  and  the 
selection  of  the  scale  estimator  value. 

The  purpose  of  this  paper  is  to  propose  an  alternative  training  method  without  such 
difficulties.  Numerical  examples  show  that  when  there  are  outlying  measurement  noises  in  the 
training  data  and  the  function  under  approximation  is  reasonably  smooth  and  does  not  have  an 
under-represented  segment,  the  proposed  adaptive  method  of  training  MLPs  in  the  presence  of 
outlying  measurement  noises  works  effectively. 

•  Adaptive  versus  Accommodative  Neural  Networks  for  Adaptive  System  Identification, 
Proceedings  of  the  2001  International  Joint  Conference  on  Neurcd  Networks,  pp.  1279- 
1284,  Washington,  D.C. 

It  has  been  proven  that  MLPs  and  RMLPs  (recurrent  MLPs)  with  LASTMs  are  universal 
series-parallel  and  parallel  identifiers  of  dynamical  systems  with  an  environmental  parameter, 
under  mild  regularity  conditions.  This  paper  demonstrates  the  numerical  feasibility  of  these 
mathematically  proven  results  with  numerical  examples.  In  the  same  examples,  the 
accommodative  neural  networks  are  also  obtained  that  have  the  same  accuracy  as  do  the 
networks  with  LASTMs  on  the  same  training  data.  Generalization  abilities  of  the  two  types  of 
network  are  then  compared.  The  comparison  results  show  that  networks  with  LASTMs  are 
superior  to  the  accommodative  networks  with  respect  to  these  two  criteria. 

•  Robust  Identification  of  Dynamical  Systems  by  Neurocomputing,  Proceedings  of  the  2001 
International  Joint  Conference  on  Neural  Networks,  pp.  1285-1290,  Washington,  D.C. 

If  a  dynamical  system  has  a  fine  feature  or  an  operating  condition  under-represented  in  the 
input/output  data  used  for  its  identification,  the  ordinary  quadratic  identification  criterion  often 
leads  to  very  large  or  disastrous  identification  errors.  H-infinity,  risk- sensitive  and  minimax 
criteria  have  been  used  to  obtain  robust  identifiers  mostly  for  linear  systems.  A 
neurocomputing  approach  to  robust  identification  with  respect  to  general  risk-averting  criteria 
was  proposed  and  mathematically  justified. 

This  paper  studies  the  numerical  feasibility  of  this  approach,  and  provides  a  method  of 
training  neural  networks  into  robust  identifiers  of  dynamical  systems.  The  new  training  method 


is  tested  on  both  the  series-parallel  and  parallel  identifications  of  the  modified  benchmark 
systems,  using  input/output  data  sets  with  and  without  noises.  The  performances  of  the 
resultant  robust  neural  identifiers  are  compared  with  those  of  the  least-squares  neural 
identifiers  trained  with  the  ordinary  quadratic  criterion.  The  robust  identifiers  consistently 
outperform  the  least- squares  identifiers  in  the  examples. 

•  Virtually  Convex  Criteria  for  Training  Neural  Networks,  Proceedings  of  the  2001 
Conference  on  Artificial  Neural  Networks  in  Engineering ,  Nov.  4-7,  2001,  St.  Louis, 
Missouri. 

This  paper  shows  that  the  risk- averting  error  criterion  is  "virtually  convex:"  As  the  risk- 
sensitivity  index  of  this  criterion  increases,  the  region  on  which  the  criterion  is  convex 
increases  monotonically  to  the  entire  weight  vector  space  except  the  intersection  of  a  finite 
number  of  manifolds  determined  by  the  training  data.  As  the  convexity  region  expands,  "worm 
holes"  are  created  so  that  a  local  search  optimization  procedure  can  travel  through  them  to  a 
lower  local  minimum.  It  is  also  proved  that  the  risk-averting  error  criterion  approaches  a 
minimax  criterion  as  the  risk- sensitivity  index  increases. 

•  Risk- Averting  Criteria  for  Training  Neural  Networks,  Proceedings  of  the  Eighth 
International  Conference  on  Neural  Information  Processing ,  pp.  476-481,  Nov.  14-18, 
2001,  Shanghai,  China. 

Essentially  the  same  as  the  above  item. 

•  Minimization  through  Convexitization  in  Training  Neural  Networks,  Proceedings  of  the 
2002  International  Joint  Conference  on  Neural  Networks,  pp.  1558-1563,  Honolulu, 
Hawaii,  May  2002. 

Essentially  the  same  as  the  above  item. 

•  Adaptive  Multilayer  Perceptrons  with  Long-  and  Short-Term  Memories,  IEEE 
Transactions  on  Neural  Networks,  vol.  13,  no.  1,  pp.  22-33,  January  2001. 

MLPs  with  LASTMs  (long-and  short-term  memories)  are  proposed  for  adaptive  processing. 
The  activation  functions  of  the  output  neurons  of  such  a  network  are  linear,  and  thus  the 
weights  in  the  last  layer  affect  the  outputs  of  the  network  linearly  and  are  called 
linear  weights.  These  linear  weights  constitute  the  short-term  memory 
and  other  weights  the  long-term  memory. 

It  is  proven  that  virtually  any  function  with  an  input  variable  and  an  environmental 
parameter  can  be  approximated  to  any  accuracy  by  an  MLP  with  LASTMs  whose  long-term 
memory  is  independent  of  the  environmental  parameter.  This  independency  of  the 
environmental  parameter  allows  the  long-term  memory  to  be  determined  in  an  a  priori  training 
and  allows  the  online  adjustment  of  only  the  short-term  memory  for  adapting  to  the 
environmental  parameter.  The  benefits  of  using  an  MLP  with  LASTMs  include  less  online 
computation,  no  poor  local  extrema  to  fall  into,  and  much  more  timely  and  better  adaptation. 
Numerical  examples  illustrate  that  these  benefits  are  realized  satisfactorily. 

•  Robust  Approximation  of  Uncertain  Lunctions  where  Adaptation  is  Impossible, 
Proceedings  of  the  2002  International  Joint  Conference  on  Neural  Networks,  pp.  1889- 
1894,  Honolulu,  Hawaii,  May  2002  (with  Devasis  Bassu). 

This  paper  is  concerned  with  robust  approximation  of  functions  with  an  environmental 
parameter  that  changes  so  fast  that  adaptation  to  it  is  impossible.  Approximation  with  respect 
to  the  ordinary  least-squares  criterion  provides  a  good  overall  approximation  but  at  the  cost  of 
large  approximation  errors  for  some  values  of  the  independent  variables.  An  alternative 
training  method  using  the  risk-averting  training  criterion  is  proposed  that  provides  robust 
function  approximation.  The  method  adaptively  adjusts  the  sensitivity  index  of  the  risk¬ 
averting  criterion  to  tune  to  the  effects  of  the  uncertain  environmental  parameter,  when  the 
measurement  noises  are  negligible  or  unbiased.  Numerical  examples  are  presented  illustrating 
the  efficacy  of  the  proposed  adaptive  risk-averting  training  method  for  producing  function 
approximates  with  different  degrees  of  robustness. 


•  Robust  Identification  of  Uncertain  Dynamical  Systems  where  Adaptation  is  Impossible, 
Proceedings  of  the  2002  International  Joint  Conference  on  Neural  Networks ,  pp.  1956- 
1961,  Honolulu,  Hawaii,  May  2002  (with  Devasis  Bassu). 

Depending  on  the  applications,  different  degrees  of  robustness  are  required  for  system 
identification  in  the  presence  of  an  environmental  parameter  that  is  unobservable  and  changes 
so  fast  that  adaptation  is  impossible.  H-infinity  and  minimax  criteria  are  too  pessimistic  for 
most  applications.  This  paper  proposes  the  risk-averting  error  criterion  and  shows  that  training 
with  respect  to  it  yields  robust  system  identifiers  with  various  degrees  of  robustness.  Numerical 
results  illustrate  the  efficacy  of  the  proposed  method  and  the  effects  of  different  degrees  of 
robustness. 

•  Existence  and  Uniqueness  of  Risk-Sensitive  Estimates,  scheduled  to  appear  in  IEEE 
Transactions  on  Automatic  Control ,  November  2002  (with  Thomas  Wanner). 

Existence  and  uniqueness  of  conditional  expectations  and  thus  minimum-variance  estimates 
are  guaranteed  by  the  Radon-Nikodym  theorem  in  measure  theory.  Existence  and  uniqueness 
issues  of  robust  estimates  with  respect  to  the  risk-sensitive  error  with  various  risk-sensitivity 
indices  are  addressed  in  this  paper.  The  existence  of  a  unique  risk- sensitive  estimate  is  proven 
provided  that  the  sensitivity  index  is  positive  and  the  power  of  the  absolute  deviations  involved 
is  greater  than  1.  For  the  remaining  cases,  a  general  existence  result  is  not  available.  We  do 
however  prove  the  existence  in  certain  special  cases.  Moreover,  we  present  examples  with 
uncountably  many  optimal  risk-sensitive  estimates,  i.e.,  exhibiting  an  extremely  high  level  of 
nonuniqueness. 

•  Recurrent  Multilayer  Perceptrons  for  Discrete-Time  Dynamic  System  Identification,  under 
revision. 

The  ability  of  two  types  of  recurrent  MLP  (multilayer  perceptron),  namely  the  MLP  with 
interconnected  neurons  (MLPWIN)  and  the  MLP  with  output  feedbacks  (MLPWOF),  for 
approximating  discrete-time  dynamic  systems  is  studied  in  the  context  of  system  identification. 
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